使用支持向量机的HIV-1整合酶ST抑制剂高低活性分类研究

Classification of Active and Weakly Active ST Inhibitors of HIV-1 Integrase Using a Support Vector Machine

Yan, A.X.*; Xuan, S.Y.; Hu, X.Y.
Combinatorial Chemistry & High Throughput Screening, 2012, 15(10), 792-805.

    采用支持向量机(SVM)方法, 基于含有1257个HIV-1整合酶ST抑制剂的数据集,建立了两个分类模型来预测HIV-1整合酶ST过程抑制剂的高、低活性。 使用MACCS指纹构建的模型对测试集的预测准确度为91.82%,马修斯相关系数(MCC)为0.73,而使用40个MOE描述符构建的模型对测试集的预测准确度为93.64%, MCC为0.79。通过对40个MOE描述符的分类和讨论,发现分子的局部电荷性质、范德华表面积,氢键形成能力和氟原子个数等性质是影响抑制剂与整合酶之间相互作用的重要因素。 此外,我们还对抑制剂进行了骨架聚类分析,发现β-二酮酸及其衍生物、萘啶羧酰胺及其电子等排体以及喹诺酮类骨架结构可能对HIV-1整合酶抑制剂的活性起关键作用。

阅读文章原文

下载原始数据

Download Supporting Information

    Using a support vector machine (SVM), two computational models were built to predict whether a compound is an active or weakly active strand transfer (ST) inhibitor based on a dataset of 1257 ST inhibitors of HIV-1 integrase. The model built with MACCS fingerprints gave a prediction accuracy of 91.82% and a Matthews Correlation Coefficient (MCC) of 0.73 on test set, and the model built with 40 MOE descriptors gave a prediction accuracy of 93.64% and an MCC of 0.79 on test set. Some molecular properties such as electrostatic properties, van der Waals surface area, hydrogen bond properties and the number of fluorine atoms are important factors influencing the interactions between the inhibitor and the integrase. Some scaffolds like β-diketo acid and its derivatives, naphthyridine carboxamide or the isosteric of it and pyrimidionones may play crucial rule to the activity of the HIV-1 integrase inhibitors.

Read More

Classification Models performance:   Dataset ( 1257 HIV-1 Integrase ST inhibitors )

Model Name Algorithm Descriptors Training set accuracy (%) Training set 5-fold cross-validation accuracy (%) Training set 10-fold cross-validation accuracy (%) Training set LOO cross-validation accuracy (%) Test set SE Test set SP Test set accuracy (%) Test set MCC
Model 1 SVM 166 MACCS 87.64 80.66 80.78 80.66 97.73 67.82 91.82 0.73
Model 2 SVM 40 MOE 90.33 81.03 80.66 81.03 98.58 73.56 93.64 0.79

Dataset 2:    299 hydroxylamine derivatives inhibitors

Model Name Algorithm Descriptors Training set R2 Training set RMSE Test set R2 Test set RMSE
Model 3A RF 25 RDKit descriptors 0.89 0.37 0.71 0.53
Model 3B SVM 16 RDKit descriptors 0.84 0.38 0.64 0.6
Model 3C RF 22 RDKit descriptors 0.78 0.43 0.61 0.56
Model 3D SVM 18 RDKit descriptors 0.8 0.41 0.65 0.62
Model 3E DNN 68 RDKit descriptors 0.90 0.30 0.69 0.59

主要项目成员

宣首逸

博士研究生

胡小英

博士研究生